Enhancing Exchange-Traded Fund Price Predictions: Insights from Information-Theoretic Networks and Node Embeddings

This study presents a novel approach to predicting price fluctuations for U.S. sector index ETFs. By leveraging information-theoretic measures like mutual information and transfer entropy, we constructed threshold networks highlighting nonlinear dependencies between log returns and trading volume rate changes. We derived centrality measures and node embeddings from these networks, offering unique insights into the ETFs’ dynamics. By integrating these features into gradient-boosting algorithm-based models, we significantly enhanced the predictive accuracy. Our approach offers improved forecast performance for U.S. sector index futures and adds a layer of explainability to the existing literature.


Introduction
Classifying listed companies into sectors offers a comprehensive perspective that is pivotal for constructing financial portfolios.The significance of the sector index as a primary boundary in classification systems, encompassing industry groups and sub-industries, is well established in the literature.This sectorial approach grants investors unique advantages, allowing them to harness specific sector opportunities, balance underrepresented sectors, and adopt active investment strategies while considering macroeconomic shifts, momentum, and other crucial factors [1].
However, while many studies have delved into sector indices and ETFs, Leung and Zhao (2021) rightly observed a research gap: there is a limited exploration of the price dynamics comparison between sector indices and ETFs.Research examining the nonlinear dependencies and causalities within sector indices, especially from an information-theoretic viewpoint, remains sparse [2].
Our contribution thus stands at the intersection of these identified gaps.The novelty in our approach lies in two key aspects: We provide a fresh perspective by delving into the nonlinear network topologies of sector indices in the U.S. market, a dominant global player.This exploration unravels intricate nonlinear information exchanges and flows, showcasing a departure from traditional linearity-based studies.
Also, our study harnesses the potential of these insights to craft predictive models.We tap into their unique predictive power by utilizing a range from short-term (20-day window) to long-term (240-day window) nonlinear dependency and causal networks, incorporating centrality-based measures and novel node embedding techniques.
The manuscript is methodically structured, offering readers a clear progression through our research.Section 2 begins by setting our study within the context of the existing literature and also presenting our dataset.Section 3 offers a thorough explanation of our research methodology, which is intricately linked to the statistical properties of our data.
In Section 4, we embark on the construction and analysis of networks, focusing on U.S. sector indices' returns across varied time frames, while diving deeply into the inherent nonlinear dependencies and causalities.Section 5 serves as a reflective space, discussing the real-world implications of our findings, especially in terms of node-level network measures, the innovative use of node embeddings, and the demonstrable success of our predictive models.Finally, Section 6 wraps up our discourse, providing concluding thoughts on our contributions and hinting at promising avenues for future research.decode the complex interplay and dynamics of these sectors.Thus, our research expands the analytical boundaries of sector-based financial research and provides a novel toolkit to better understand and predict sector behaviors and interactions in today's intricate financial landscape.

Data
We used Standard and Poors' U.S. sector index ETFs' price and trading volume data as the experimental data.There are eleven sectors in the sector Standard and Poor's Depository Receipt (SPDR) ETFs: Materials (XLB), Communications Services (XLC), Energy (XLE), Financials (XLF), Industrials (XLI), Technology (XLK), Consumer Staples (XLP), Real Estate (XLRE), Utilities (XLU), Health Care (XLV), and Consumer Discretionary (XLY).In this paper, we only used the data from nine sector indices (XLB, XLE, XLF, XLI, XLK, XLP, XLU, XLV, and XLY), because those nine sector indices, from the beginning, were listed on the New York Stock Exchange (NYSE) Arca Exchange on 16 December 1998, but the other two were not.XLRE was listed on 8 October 2015 at NYSE Arca Exchange, and XLC was listed on 19 June 2018 at NYSE Arca Exchange.Those two indices have relatively fewer data; thus, we cannot conduct a network analysis and predict their fluctuation at the same condition.The experimental period is from January 2010 to September 2022.This experimental period is twelve years and nine months long, including a total of 51 quarters and 153 months.In the case of the predictive experiment, we designated the period from January 2010 to December 2018 (about 70% of the whole dataset) as the training and validation set and designated the period from January 2019 to September 2022 (about 30% of the whole dataset) as the testing set.The target data are only the return data of the nine U.S. sector index ETFs.Table 1 provides an overview of the datasets employed throughout our research.

Methodology
Based on Shannon's entropy [19] concept, mutual information and transfer entropy serves as a nonparametric methodology to verify information exchange between pairs of variables.
Data assumptions such as normality, stationarity, and linearity should be preceded by general dependencies and causal relationships represented like Granger causality [20,21].However, it is known that the nature of stock return-based data usually only satisfies some of these properties.Therefore, we used theories of econopyhsics and information theories that can be used without the above assumptions.To use these theories, we can consider nonlinear relationships between objectives to measure dependencies and causal relationships.Accordingly, we used the concept of mutual information (MI), first proposed by Shannon, and transfer entropy (TE), proposed by Schreiber (2000) [22].These are entropybased measures.Specifically, in this study, we used normalized mutual information (NMI) and transfer entropy (TE) based on Shannon entropy with a permutation test for threshold network construction (Boba et al., 2015) [23].

Mutual Information
Mutual Information (MI) is a measure that captures the shared information between two variables, indicating their statistical interdependence.In the field of information theory, the behavior of a system, say System X, is understood through its probability distribution p(x) and logarithm value of p(x).Based on this idea, the Shannon entropy is as follows: Shannon entropy quantifies the information required to identify random values from a discrete distribution.When two subsystems, X and Y, are present in a state of the system, their combined probability distribution is represented by a joint probability.
H(X, Y) = − ∑ x∈X,y∈Y p(x, y)log 2 p(x, y). (2) Finally, we can define MI as the quantity of identifying the interaction between subsystems.
Mutual Information (MI) has been widely utilized in finance for network analysis across various stock exchanges.It's been instrumental in developing networks and selecting portfolios, mainly using short-term data in different markets, and examining market behaviors during significant changes or events.This approach provides insights into market dynamics and investor sentiments in diverse economic contexts [24][25][26][27][28][29][30].

Normalized Mutual Information (NMI)
One of MI's disadvantages is that it is hard to compare results from the MI derived from different data.Because the domain of MI is always finite for the discrete random variables, the maximum value of the MI is not constant.In other words, this means that it is hard to compare the statistical dependence derived from different datasets.Therefore, we used NMI to compare politically themed stock networks within the same range [0, 1].Since there are several normalized variants of NMI, their properties are slightly different.In this study, we used NMI with a minimum of two entropies, as shown in (4), because the normalization version of mutual information measures should be based on the least upper bound, min(H(X), H(Y)).Using NMI with a minimum of two entropies ensures that the maximum attainable value of NMI is one.This version of NMI is irrespective of the dimensions of two discrete variables and the marginal probabilities [31][32][33][34][35] (Kvålseth, 1987;Banerjee et al., 2005;Kraskov et al., 2005;Vinh et al., 2010;Sarhrouni et al., 2012;Kvålseth, 2017).

Transfer Entropy (TE)
Transfer Entropy (TE), based on Shannon entropy and mutual information concepts, is a non-parametric metric for quantifying information transfer between two variables.Unlike Granger causality, which is prediction-oriented, TE focuses on reducing uncertainty, measuring how one variable clarifies the future of another beyond its own past contributions.TE stands out as a model-free approach for identifying causal links in dynamic systems, particularly useful in finance for analyzing connections between various financial entities and market dynamics.This method is renowned for its ability to efficiently pinpoint sources and targets in causal relationships [36].
Transfer Entropy has been a key tool in financial research to explore causal relationships.Studies have examined the connections between credit default swap and bond markets, the causal links among international financial firms, the interplay between exchange rates and stock prices in emerging economies, and the information flow in U.S. equity and commodity markets.This method has proven effective in understanding both internal and cross-market dynamics, demonstrating its versatility in different financial contexts [37][38][39][40].
Based on the concepts mentioned earlier related to entropy, conditional entropy quantifies the amount of information needed to describe the outcome of a random variable, X, given that the value of another random variable, Y, is known.Here, the conditional entropy of X given Y can be expressed as It can be interpreted as the uncertainty about Y when X is known, or as the expected number of bits needed to describe X when Y is known to both the encoder and the decoder.Based on the above definition, we can define the general form of (k, l)-history TE between two time series, X t and Y t , for x (k) t = (x t , . . ., x t−k+1 ) and y (l) t = (y t , . . ., y t−l+1 ).The general (k, l)-history transfer entropy can be expressed as follows (Bossomaier et al., 2016): where i = x t+1 , x Y→X (t) is non-negative, and we can drop the time depen- dency argument, t, for stationary processes.TE (k,l) Y→X (t) represents the information about the future state of X I , which can be obtained by subtracting information retrieved from only X (k) t from the information gathered from both X  In this study, we focused on the TE under the following conditions of two lags, k = l = 1.These settings for lags are typically chosen as they align with the principles of the weak form of the Efficient Market Hypothesis (EMH) and the notion that stock prices follow a random walk pattern.[39,40].Then, we can express the equation of ( where i = {x , x , y }.

Test for Obtaining p-Values of MI and TE
In recent research, transfer entropy has often been explored without considering the finite-size effects arising from sample variations.In this study, we adopt a nuanced approach by integrating the permutation test alongside transfer entropy to effectively mitigate these finite-size effects [23,36].The strength of permutation tests lies in their nonparametric nature, necessitating minimal assumptions and making them particularly adept at discerning statistical significance, especially when the data-generating mechanism remains elusive.To ensure robustness, we introduced a shuffling mechanism for each element in the time series, deriving mutual information and transfer entropy complemented by p-values.Our methodological choice of conducting 1000 shuffles (M = 1000) to compute ETE values underscores our commitment to precision.This novel integration, encapsulating both permutation tests and transfer entropy, is our distinct contribution to the literature, offering insights with heightened accuracy.

Network Analysis
Network analysis offers a unique lens through which we can gain insights into complex social phenomena by representing them as interconnected systems.This approach not only simplifies the depiction of interactions but also provides a structured framework for understanding the intricate dynamics between different entities.In the context of our research, we view the entropy-based causal relationships among exchange rates as a web of interactions between sector ETFs.By doing so, we unlock the potential to visualize and delve deeper into the intricate ties binding these sector ETFs.
For our study, we leveraged various network topology metrics, transforming them into features for our predictive models.These metrics serve as crucial indicators, helping us navigate the vast network landscape and understand our dataset's underlying patterns.

Network Theory
Nodes and links are the fundamental components of network theory.As a subject component of a network, a node functions as an interactive agent.A link or connection between two subjects is also an interaction between them.The network type can be classified according to the connection's characteristics.Networks can be classified as In this study, we focused on the TE under the following conditions of two lags, k = l = 1.These settings for lags are typically chosen as they align with the principles of the weak form of the Efficient Market Hypothesis (EMH) and the notion that stock prices follow a random walk pattern.[39,40].Then, we can express the equation of (1,1)-history TE as follows: where i = {x t+1 , x t , y t }.

Test for Obtaining p-Values of MI and TE
In recent research, transfer entropy has often been explored without considering the finite-size effects arising from sample variations.In this study, we adopt a nuanced approach by integrating the permutation test alongside transfer entropy to effectively mitigate these finite-size effects [23,36].The strength of permutation tests lies in their non-parametric nature, necessitating minimal assumptions and making them particularly adept at discerning statistical significance, especially when the data-generating mechanism remains elusive.To ensure robustness, we introduced a shuffling mechanism for each element in the time series, deriving mutual information and transfer entropy complemented by p-values.Our methodological choice of conducting 1000 shuffles (M = 1000) to compute ETE values underscores our commitment to precision.This novel integration, encapsulating both permutation tests and transfer entropy, is our distinct contribution to the literature, offering insights with heightened accuracy.

Network Analysis
Network analysis offers a unique lens through which we can gain insights into complex social phenomena by representing them as interconnected systems.This approach not only simplifies the depiction of interactions but also provides a structured framework for understanding the intricate dynamics between different entities.In the context of our research, we view the entropy-based causal relationships among exchange rates as a web of interactions between sector ETFs.By doing so, we unlock the potential to visualize and delve deeper into the intricate ties binding these sector ETFs.
For our study, we leveraged various network topology metrics, transforming them into features for our predictive models.These metrics serve as crucial indicators, helping us navigate the vast network landscape and understand our dataset's underlying patterns.

Network Theory
Nodes and links are the fundamental components of network theory.As a subject component of a network, a node functions as an interactive agent.A link or connection between two subjects is also an interaction between them.The network type can be classified according to the connection's characteristics.Networks can be classified as directional or non-directional depending on whether they have a direction.Moreover, if the network is weighted, it is known as a weighted network; otherwise, it is known as a binary network.
Graphs and matrices are used to represent networks.Both systems have advantages in terms of mathematical processing and visual explanation.Using a graph format is a means to depict a network and intuitively show its shape by giving nodes and links shapes, colors, sizes, labels, and arrows.A matrix format describes network attributes as a matrix, often called adjacency matrix.
We used the weighted-undirected graph for MI and the directed-weighted graph for TE.Then, we used the p-value from the permutation tests as a threshold for deciding connections between two U.S. sector index ETFs' log-returns and trading volumes' rates of changes.In other words, the p-values of MI and TE determine the linkages in the created directed weighted network.Our p-value-based threshold value was 0.1, one of the conventional statistics values.
In detail, each edge (u, v) ∈ E was attributed a w(u, v) calculated from one of our statistical dependency measures and an associated p-value, p(u, v).
To construct a p-value Threshold Tree, we define a p-value criterion under which an edge is considered statistically significant.For a commonly used significance level of α = 0.05, our construction rule might be articulated as

Centrality Measures
Our data were collected from observations within the same system over time.We analyzed topological measures of an evolving network at regular time intervals, yielding time-ordered sequences of topological observations.Our focus was primarily on centrality measures, which are quantified by applying a real-valued function to the vertices of a graph, aiming to rank nodes based on their significance.
The concept of "importance" in a network can be interpreted in various ways, leading to different centrality definitions.One approach conceptualizes importance in terms of network flow or transfer, categorizing centralities based on the type of flow they emphasize.Alternatively, importance can be seen as a node's contribution to network cohesion, leading to centralities that assess cohesiveness.
Different centrality measures consider the number of paths passing through a node, varying in their definition and counting of relevant paths.This approach allows for a classification spectrum ranging from centralities concerned with short paths (like degree centrality) to those involving longer or infinite paths (such as eigenvector centrality).Other measures, like betweenness centrality, focus on a node's role in overall network connectivity.The centrality measures used in our study are detailed in Table 2, with their respective citations ranging from .

Number of Maximal Cliques
Understanding the intricate relationships between financial assets is of paramount importance.Traditional approaches, however, often miss the nuanced, nonlinear connections inherent within networks of assets, such as U.S. sector index ETFs.This is where node embeddings come to the fore, offering a fresh perspective that traditional methods may overlook.
Node embeddings convert the intricate attributes and relationships of nodes within a network into representative vectors.These vectors encapsulate essential structural details and node-specific features, furnishing us with enhanced capabilities for analysis and prediction.In our research, we leaned into these advantages, focusing specifically on understanding how features derived from node structures within our constructed network could aid in predicting the movement of nine U.S. sector index ETFs.
We harnessed the power of two notable node embedding techniques: Role2vec [69] and FEATHER [70].While both techniques excel at capturing essential node information, they differ subtly in their focuses.Role2vec zeroes in on the structural characteristics of nodes, and FEATHER brings to light the attributes specific to each node.By employing both, we ensured a comprehensive grasp of the diverse facets of the network, from its broader architecture to the individual attributes of its constituents.
To refine our approach, we adopted 1024-dimensional embeddings and distilled them into more manageable 32-dimensional vectors.This refinement was achieved using UMAP [61], a technique renowned for preserving global structural information while having its roots firmly planted in Riemannian manifold and topological data analysis.
After deriving the node embeddings in our study, we further integrated them into our prediction framework.Specifically, we based our predictions on networks influenced by both mutual information and transfer entropy.This deliberate integration into the prediction problem enabled us to tap into the information-theoretic network structures.Role2vec [69] Ahmed et al. (2018) presented the Role2Vec framework, which employs the flexible concept of attributed random walks and serves as the foundation for leveraging random walks.Their proposed framework expands the applicability of these methods to transductive and inductive learning, as well as graphs with attributes (if available).This is accomplished by acquiring functions that are applicable to new nodes and graphs [69].
Role2vec focuses on learning role-based graph embeddings.The core idea is to learn a mapping of nodes to roles in the graph, and then learn embeddings for these roles.
FEATHER [70] FEATHER, introduced by Rozemberczki and Sarkar (2020), is a flexible concept of characteristic functions defined on graph vertices to characterize the distribution of vertex features at multiple scales.FEATHER is a computationally efficient algorithm for calculating a particular variant of these characteristic functions in which the probability weights are defined as the transition probabilities of random walks [70].
Rozemberczki and Sarkar (2020) argued that the extracted features are useful for machine learning tasks at the node level.The pooling of these node representations yields compact graph descriptors that can serve as features for graph classification algorithms.
They also demonstrated that FEATHER describes isomorphic graphs using the same representation and is resistant to data corruption analytically [70].
UMAP [71] UMAP is a dimensionality reduction technique that is well suited for visualizing high-dimensional datasets.Developed by McInnes, Healy, and Melville in 2018, UMAP operates based on Riemannian geometry and algebraic topology principles [71].
At its core, UMAP constructs a high-dimensional graph representation of the data and subsequently optimizes a low-dimensional version of this graph to produce a dimensionreduced representation.The method starts by approximating the data's manifold by using a fuzzy simplicial set.The next step involves finding a low-dimensional representation of this set.
Topological Data Analysis: Used to understand the high-dimensional structure of the data.

2.
Fuzzy Simplicial Sets: Used to approximate the manifold the data resides on, providing both local and global preservation.

3.
Riemannian Geometry: Used to accurately measure distances and maintain data relationships.
Our research employed UMAP to condense the 1024-dimensional vectors obtained from our node embeddings down to a more manageable 32-dimensional space.This reduction was imperative not only for visualization but also for enhancing computational efficiency without significantly compromising the structural integrity of our dataset.The robust foundation of UMAP in topological data analysis ensured that the global structure of our data was retained, making the resulting low-dimensional embeddings particularly insightful for subsequent analyses.
Integrating node embeddings is not just a technical addition, but a revolutionary step in bridging the identified research gaps.As our abstract suggests, we aim to bring a layer of explainability that is absent in the existing literature.Node embeddings, especially using Role2vec and FEATHER, allow us to achieve this.We transform abstract financial relationships into tangible, quantifiable data points by converting nodes and their intricate relationships into vectors.This paves the way for integrating these insights into predictive models that forecast with higher accuracy and provide deeper insights into the underlying dynamics.
While many studies have delved into sector indices and ETFs, our adoption of node embeddings elevates our research by emphasizing prediction and understanding.The resulting models are not black boxes, but interpretable tools that shed light on the intricate web of relationships within the U.S. sector index ETFs, marking a significant advancement in financial network analysis.

Machine Learning Algorithms and xAI (eXplainable Artificial Intelligence) Techniques
We mainly used the most frequently used basic machine learning models to predict index sector ETFs' returns despite there being many state-of-the-art models with good performance.For example, although many studies have shown that recurrent neural networks and gradient-boosting algorithms based on them perform very well in various areas, this study aims to extract features that can be used in all machine learning models through a nonlinear measure-based network analysis to see their effects.Accordingly, we tried to confirm the performance of well-known machine learning techniques.We used three tree-based machine learning algorithms: XGBoost, LightGBM, and CatBoost [72][73][74].We used those three prominent models for the following reasons: 1.
Interpretability: Tree-based models, at their core, make decisions based on certain conditions, making them more interpretable than many deep learning models.This interpretability is vital in financial sectors, where understanding the reasons behind predictions can be as critical as the predictions themselves.

2.
Handling of Mixed Data Types: Financial datasets often consist of numerical and categorical data.Tree-based models like CatBoost are particularly effective at handling categorical variables without extensive preprocessing.

3.
Automatic Feature Selection: These models inherently perform feature selection.As a result, they can identify and prioritize the most essential features, which is particularly useful in financial datasets with potentially redundant or less impactful variables.4.
Resistance to Overfitting: With techniques such as gradient boosting and regularization in models like XGBoost and LightGBM, tree models exhibit resistance to overfitting, especially when appropriately tuned.

5.
Flexibility: These models can easily capture nonlinear relationships in the data, which is common in financial time series data.Traditional linear models might not capture this nonlinearity as quickly.6.
Efficiency and Scalability: Models like LightGBM and CatBoost have been designed with efficiency in mind.They can handle large datasets, making them suitable for comprehensive financial data.7.
Consistency in Results: While deep learning models like RNNs can be potent, they require more meticulous fine-tuning and can sometimes produce inconsistent results due to their complex architectures.In contrast, tree models, once well-tuned, can provide more consistent predictions.

8.
End-to-End Modeling: These models do not necessarily require extensive data preprocessing or normalization, making the modeling process more straightforward and sometimes more accurate since no information is lost in preprocessing.

XGBoost
XGBoost (Chen and Guestrin, 2016) [72] is an algorithm that uses the boosting gradient technique proposed by Friedman (2001) [75].XGBoost is an ensemble algorithm utilizing gradient tree boosting to enhance classifiers in a sequential manner.Its primary benefit lies in scalability across various situations, making it a highly popular choice for regression tasks.
Equation ( 9) is an expression representing the ensemble model of the tree, F is the collective space of all possible classification and regression trees (CART).At this time, the final prediction is made by summing and comparing the scores of each leaf.Equation ( 10) is a normalized objective function of the XGBoost model.l( ŷi , y i ) is a differentiable convex loss function that measures the difference between the predicted and target values, and it is also a normalization term, and Ω(f k ) is a CART function that prevents overfitting problems by smoothing the final learned weights by adjusting the complexity of the model.γT represents the number of leaves in CART, and 1  2 λ ∥ w ∥ 2 represents the score assigned to the leaves in CART.

Light Gradient Boosting Machine (LightGBM)
LightGBM, developed by Ke et al. in 2017 [73], is a gradient-boosting machine learning model that incorporates gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB) to handle variables efficiently.Its unique vertical growth structure makes it more accurate and efficient than other machine learning approaches.
In Equation ( 11), f t (x) is a tree, and its objective function is estimated using Newton's method.

CatBoost
CatBoost, introduced by Prokhorenkova et al. in 2018, is a gradient boosting algorithm focused on categorizing data.It stands out in handling categorical features through sequential boosting and decision tree-based techniques.The trees in CatBoost are created by grouping similar instances within the learning dataset, contributing to its superior performance compared to other gradient boosting methods.

SHAP (Shapley Additive Explanation)
Machine learning shows promise in time series prediction but often lacks explanatory power.Addressing this, Lundberg and Lee (2017) introduced the SHAP method, enhancing interpretability across various machine learning models [76].SHAP, based on the Shapley value from game theory [77], is a key approach in explainable AI (xAI), elucidating predictions by assessing the impact of individual features.This method calculates average Shapley values using game theory principles, clarifying predictions through the contribution of each data feature.
ϕ i is the Shapley value for the data, and N is the set of total input variables.S is the set of variables except for the i-th variable in the total input variable, and v(S) is the contribution that the remaining subset, except the i-th data, contributed to the result, and f x (S ∪ {i}) is the total contribution including the i-th data.
In this study, we also generated equally weighted soft-voting regressors to check the average overall performance and analyzed their mean absolute SHAP values.

Performance Metrics of Classification Problem
In addition, we calculated the relationship between the prediction values and real values (fluctuation) using the confusion matrix derived from the classification results.The confusion matrix is typically used to ascertain whether the predicted value was derived appropriately compared with the actual value.In this experiment, the confusion matrix was used to determine the extent to which up or down predictions fall into the fluctuations of U.S. sector index ETFs' prices.
Figure 2 shows a confusion matrix, and Table 3 further presents the evaluation metric used for the confusion matrix.Hamming Loss where y , is the predicted value for the j-th label of a given sample i, y , is the corresponding true value, n is the number of samples, n is the number of labels (in this study, n = 2), and 1(x) is the indicator function.

Exploratory Data Analysis (EDA)
Tables 4 and 5 present the descriptive statistics for our two datasets, focusing on log return and the rate of change in trading volume for various securities like XLB, XLE, XLF,

F1 Score
Hamming Loss where ŷi,j is the predicted value for the j-th label of a given sample i, y i,j is the corresponding true value, n samples is the number of samples, n labels is the number of labels (in this study, n labels = 2), and 1(x) is the indicator function.

Exploratory Data Analysis (EDA)
Tables 4 and 5 present the descriptive statistics for our two datasets, focusing on log return and the rate of change in trading volume for various securities like XLB, XLE, XLF, and others.For the price data in Table 4, the mean values indicate the average return of each security over the studied timeframe.Most returns are proximate to zero, but XLK and XLY stand out with the highest mean returns of 0.0006.The standard deviation showcases the inherent volatility or risk, with XLE being the most volatile, having a standard deviation of 0.0183, and XLP being the least volatile at 0.0090.The minimum and maximum values capture the extreme returns; XLE saw the largest negative return at −0.2249, while XLF recorded the highest positive return at 0.1524.Quartiles, particularly the medians, often reveal positive returns aligned with the mean values.The skewness of most securities is negative, suggesting that the left tail, or the negative returns, extends more than the right.The kurtosis, which is consistently greater than 3 for all securities, points to a distribution with heavier tails than a normal distribution, implying a higher likelihood of extreme values or outliers.
In Table 5, which covers trading volume data, we observe that XLF, intriguingly, has a negative mean, hinting at a general decreasing trend in its trading volume.On the volatility front, XLP leads with the most volatile trading volume.The skewness values for volume differ across securities, suggesting varied asymmetry in their volume distributions, and the kurtosis indicates that spikes or dramatic drops in trading volumes can occasionally occur.
Several statistical tests were performed, such as the Shapiro-Wilk test, Kolmogorov-Smirnov test, and Jarque-Bera test for normality.The Ljung-Box test was employed to scrutinize the autocorrelation at different orders, and the Augmented Dickey-Fuller (ADF) test was used to check for stationarity.The symbols *, **, and *** denote statistical significance at the 0.1, 0.05, and 0.01 levels.The outcome of this rigorous testing reveals that none of the data columns adhere to normality across both sets of descriptive statistics.This non-compliance with normality furnishes a quantitative foundation for leveraging nonparametric methodologies that do not rest on assumptions like normality.For the more detailed analysis of non-normality in Tables 4 and 5, Tables 6 and 7 delineate the ratios at which the null hypothesis was rejected during normality testing for all generated datasets.Seven normality tests were conducted, including the Shapiro-Wilk test, D'Agostino K-squared test, and others.A striking revelation from these tests is that more than 80% of the dataset used for calculations failed to meet the criteria for normality.This discovery underpins our methodology decision to employ nonlinear nonparametric measures like mutual information and transfer entropy, which remains unfazed by normality prerequisites.The central limit theorem (CLT) theoretically suggests that as the sample size swells, the distribution of sample means should approximate a Gaussian distribution.However, our empirical findings, through exploratory data analysis (EDA), demonstrate a significant trend: as the window length elongates, the null hypothesis becomes rejected more frequently across all normality tests.

Prediction Performance
We conducted 100 iterations, changing their seeds to consider the robustness of our experiments.Table 8 shows the average values of our predictions' performance metrics.The original dataset includes U.S. sector index ETFs' price-related data.
In our comprehensive analysis, meticulous care was taken to ensure the robustness of our experimental results.To this end, a series of 100 iterations were conducted, each employing a distinct seed, thereby enhancing the reliability and generalizability of our findings.Presented in Table 8 are the aggregate values derived from our prediction's performance metrics.This table facilitates a nuanced comparative examination between the performance outcomes obtained from the original dataset, predominantly price-related data from U.S. sector index ETFs, and the outcomes following the integration of advanced features.Specifically, these newly incorporated features are grounded in mutual infor-mation (MI) and transfer entropy (TE)-based network embeddings coupled with intricate network topology measures.
The last three columns of Table 8 are dedicated to the Paired T-test results.In evaluating our models, we employed a paired t-test to statistically validate the observed differences between the mean values obtained using different methods.The paired t-test was executed on three machine learning techniques' results.Depending on the specific category under consideration, we adjusted our hypothesis.For most categories, our alternative hypothesis posited that the mean values in columns like the performance metrics, after including MI and TE network-based columns (network embeddings and network measures), were greater than their corresponding values in the original datasets, respectively.However, for the "Hamming Loss" category, we reversed this hypothesis, testing if the mean values in the original datasets' performance metrics were greater than those in the performance metrics after including MI and TE network-based columns.
Our results, formatted with significance levels, clearly indicated the differences between the paired columns.Significance levels were denoted with asterisks, where '***' indicates a 0.01 level significance, '**' indicates a 0.05 level, and '*' indicates a 0.1 level.The values with a p-value greater than 0.1 are presented without any asterisks.
In Table 8, a comprehensive evaluation of three machine learning models-XGBoost, LightGBM, and CatBoost-is presented across various metrics and datasets.The datasets under consideration include the original dataset and a refined dataset enriched with Proposed features, specifically the MI and TE-based network-driven features.The table also highlights the results of the paired t-test statistic, which provides insight into the statistical significance of the performance differences observed between the two datasets.
For each sector, such as XLB (Materials) and XLE (Energy), we observed the performance of the three models on both datasets.At a glance, it is evident that the dataset with proposed features often achieves better or comparable results than the original dataset across most sectors and metrics.This indicates that the added features provide valuable information that enhances the model's performance.
The XGBoost model consistently demonstrates improved performance on the dataset with the proposed features across nearly all sectors.The improvements are especially noticeable in sectors like XLE (Energy) and XLK (Technology), with the significance of this observed improvement reinforced by the paired t-test statistic.
LightGBM, on the other hand, while benefiting from the proposed features in sectors like XLB (Materials) and XLV (Health Care), shows diminished performance in others, such as the XLI (Industrials) sector.This suggests that while the proposed features enhance model robustness, they might introduce noise or redundancy for specific models or sectors.
CatBoost's performance, compared to the other two models, is generally superior on the proposed dataset, especially in sectors like XLB (Materials) and XLK (Technology).The t-test values further validate the significance of these observations.Diving deeper into the metrics, accuracy, which provides a general sense of model performance, often shows noticeable improvement when models are trained on the dataset with proposed features.Similarly, balanced accuracy, which provides a more nuanced view, especially in imbalanced datasets, mirrors the trends of regular accuracy.Cohen's Kappa Coefficient, which assesses the agreement between predicted and actual classifications, significantly improves models like XGBoost in sectors such as XLE (Energy).
Other metrics, such as precision, recall, F1, and F-Beta scores, provide a detailed view of model performance by considering false positives and false negatives.Across most sectors, the dataset with proposed features tends to enhance these metrics for all three models, especially for XGBoost and CatBoost.The Hamming Loss metric, which evaluates the fraction of incorrectly predicted labels, also indicates fewer incorrect predictions for many sectors when using the proposed features.
In conclusion, Table 8 underscores the potential of the proposed MI and TE-based network-driven features in enhancing machine learning model performance across various sectors.While all three models benefit from these features, XGBoost and CatBoost often show the most pronounced improvements.The paired t-test statistics emphasize the value and reliability of incorporating the proposed features into the dataset.

Feature Importance of Causal Network-Related Features
We used the mean absolute SHAP values from the prediction results to elucidate the impact of our causal network-derived measures.Tables 9-14 detail the top 20 features determined by the mean absolute SHAP values from the prediction results.The decision to use the top 20 features was guided by an elbow method, revealing a notable difference around the twentieth feature.Therefore, for our nine target ETFs across three gradient boosting algorithms (XGBoost, LightGBM, and CatBoost), a total of 540 features are presented.In detail, 20 (top 20 important features derived from mean absolute SHAP values) × 3 (the number of gradient boosting algorithms) × 9 (the number of target ETFs) features were used for the post-xAI analysis.We conducted a comprehensive analysis of the 540 features in our dataset.These features were systematically categorized to understand their characteristics and relationships better.By grouping them, we could identify patterns, trends, and anomalies more clearly, ensuring a streamlined and efficient data assessment.In Table 9, of the 540 features, our proposed features constitute 283 (52.41%), while the original features account for 257 (47.59%).Given that the 18 original features are composed of nine log-return columns and nine columns depicting the rate of change in the trading volume, it implies that one feature, on average, appears approximately 14.3 times across the 27 models.Notably, the trading volume contributes nearly as much as price, solidifying its relevance in our study.
Table 10 underscores that roughly 70% of node-derived features stem from the MI network, encapsulating over a third (36.67%) of the 540 features.This suggests the importance of nonlinear mutual dependencies in analyzing the U.S. sector index.Notably, though nonlinear causal relationships are not ubiquitous, they hold significance when they emerge, especially for fluctuation prediction.
Observations from Table 11 demonstrate that features derived from 20-day, 60-day, 120-day, and 240-day windows span short-term to long-term influences on the U.S. sector index ETFs' fluctuations.Short-term (20-day) features particularly exert substantial influence, more so with trading volume data than price-based log-return data.This indicates that nonlinear dependencies and causality hold consistent importance in gradient-boosting algorithm-based predictions.
Our findings, as detailed in Table 12, indicate that centrality measures and node embeddings both play pivotal roles in predicting U.S. sector index ETFs.Most notably, of the 32 node embedding-based features, 130 make the top 20 based on the mean absolute SHAP values, implying that node embeddings are instrumental in enhancing prediction accuracy.
Table 13 highlights that nearly all centrality measures are among the top 20 in predicting fluctuations in U.S. sector index ETFs.Specifically, second-order centrality, PageRank, and HITS-which all consider relative connections as opposed to direct ones-are predominant.This supports our approach of emphasizing interconnectivity and causality through network analysis.
Conclusively, in Table 14, while Role2vec (a structural node embedding algorithm) represents 56.92% of node embeddings, FEATHER (an attributed node embedding algorithm) is also notable at 43.08%.This paves the way for exploring a broader range of node embeddings in subsequent studies.Additionally, refining the embedding vector dimensions merits further investigation, given its variability based on objectives.

Discussion
To understand the intricacies of U.S. sector index ETFs' price fluctuations, this study embarked on a mission to validate the existence and implications of nonlinear dependencies and causal relationships emanating from log returns and the rate of changes in trading volumes.By incorporating sophisticated techniques such as mutual information, transfer entropy with permutation tests, and threshold networks, we sought insights from the intricate web of relationships that govern ETFs' price dynamics.Our methodological approach led to the construction of undirected and directed weighted networks, illuminating the nonlinear dependencies and causal dynamics between the U.S. sector index ETFs.
A significant revelation of our research was the potential to enhance the predictive prowess of machine learning models using centrality measures and node embeddings derived from the topology of these constructed networks.By integrating these measures into our prediction models, not only did we achieve superior forecasting accuracy, but we also heightened the explanatory capabilities of our models.The visual representation of information-theoretic dependency and information transfer through the networks further underscored the relationships and dependencies between the ETFs, offering a richer understanding of their interconnected dynamics.
A cornerstone of our analysis was using SHAP, rooted in Shapley values, to quantify the effectiveness of our centrality measures and node embeddings in forecasting the returns of U.S. sector index ETFs.The insights gleaned from analyzing the mean of absolute SHAP values were revelatory.They affirmed that features derived from our information-theoretic networks, especially those grounded in various temporal windows ranging from short-term (20-day) to long-term (240-day) windows, were pivotal in forecasting the fluctuations of U.S. sector index futures.This not only bolstered the case for employing log returns and trading volumes' rates of changes as reliable measures for capturing mutual information and transferring entropy across diverse temporal windows, but also solidified their utility in crafting a plethora of networks and subsequently harnessing their node-level properties.
Our study illuminated the path forward in forecasting U.S. sector index futures.By harnessing the power of nonlinear measure-based networks and their node-derived features, we not only refined the predictive accuracy of our models but also enriched their explainability.This study's revelations underscore these techniques' promises and set the stage for future explorations in this domain.

Conclusions
In our quest to understand and predict the dynamics of U.S. stock market sector indices, our study introduced a unique perspective by employing a nonlinear, nonparametric measure-based complex network approach.Harnessing the nuanced insights from information entropy-based measures, we shed light on the intricate information-theoretic relationships embedded within U.S. stock market sector indices' price and trading volume data.By delving into these nonlinear dependencies and causal relationships, we ventured into an under-explored domain and offered a fresh lens for forecasting U.S. sector index ETF prices.Some of the distinct contributions of our study include the following: • The utilization of information entropy-based measures to discern and showcase the underlying relationships in U.S. stock market sector indices.
• The illustration of nonlinear dependencies and causal relationships in the U.S. market sector index networks, which is an area that has yet to be deeply probed.• The revelation that nonlinear dependencies and causal relationships can significantly contribute to predictive models, shedding light on new avenues in market forecasting.

•
Empirical evidence supports using return-based data to bolster prediction results by probing into the intricate webs formed by return and trading volume networks.This offers a promising direction in enhancing data efficiency by leveraging inter-sectoral relationships without additional external features.
Despite our contribution, we recognize that there is always room for growth and improvement.For starters, diving deeper into the vast world of machine learning and graph embedding techniques could sharpen our analysis.These advanced tools might offer a more transparent lens to view our predictions.We also acknowledge that perfecting our models and tweaking their internal settings or "hyperparameters" could potentially enhance the accuracy of our sector forecasts.Moreover, while our focus has been specific, bringing in data like macroeconomic variables could offer a fuller picture and add depth to our findings.
While our research has added value to the financial sector's understanding, it is just one step in a longer journey.We are excited about future studies and the potential to delve even deeper into the intricacies of financial forecasting.
schematic representation of transfer entropy is shown in Figure1.

Table 1 .
Sector ETFs and their descriptions.

Table 2 .
Centrality measures and their definitions.

Table 6 .
Rejected ratios of all generated datasets derived from the normality test results (price).

Table 7 .
Rejected ratios of all generated datasets derived from the normality test results (volume).

Table 9 .
Comparison of the number of original features and proposed features.

Table 10 .
Comparison of the number of MI-based features and TE-based features.

Table 11 .
Comparison of the number of features based on window length.

Table 12 .
Comparison of the number of centrality measures and node embeddings.

Table 13 .
Comparison of the number of features based on centrality measures.

Table 14 .
Comparison of the number of Role2Vec vectors and FEATHER vectors.